Lamp - Tr - 065 Car - Tr - 962 Cs - Tr - 4218 4400019848

نویسنده

  • Tapas Kanungo
چکیده

Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm rst estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated degradation model is used to estimate the probability of an ideal binary pattern, given the noisy observed pattern. This probability is estimated by degrading noise-free document images and then computing the frequency of corresponding noise-free and noisy pattern pairs. This conditional probability is then used to construct a lookup table to restore the noisy images. The impact of the restoration process is then quantiied by computing the decrease in OCR word and character error rate. We nd that given the estimated degradation model parameter values, the restoration algorithm decreases the character error rate by 16.1% and the word error rate by 7.35%. In some categories of degradation (e.g. model parameters that give rise to broken characters) there is a 41.5% reduction in character error rate and a 20.4% reduction in word error rate. Abstract Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm rst estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated degradation model is used to estimate the probability of an ideal binary pattern, given the noisy observed pattern. This probability is estimated by degrading noise-free document images and then computing the frequency of corresponding noise-free and noisy pattern pairs. This conditional probability is then used to construct a lookup table to restore the noisy images. The impact of the restoration process is then quantiied by computing the decrease in OCR word and character error rate. We nd that given the estimated degradation model parameter values, the restoration algorithm decreases the character error rate by 16.1% and the word error rate by 7.35%. In some categories of degradation (e.g. model parameters that give rise to broken characters) there is a 41.5% reduction in character error rate and a 20.4% reduction in word error rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morphological Degradation Models and their Use in Document Image Restoration Qigong Zheng and Tapas Kanungo Morphological Degradation Models and their Use in Document Image Restoration

Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm rst estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated de...

متن کامل

LAMP - TR - 145 CS - TR - 4877 UMIACS - TR - 2007 - 36 HCIL - 2007 - 10 July 2007 Exploring the Effectiveness of Related Article Search in PubMed

We describe two complementary studies that explore the effectiveness of related article search in PubMed. The first attempts to characterize the topological properties of document networks that are implicitly defined by this capability. The second focuses on analysis of PubMed query logs to gain an understanding of real user behavior. Combined evidence suggests that related article search is bo...

متن کامل

CAR - TR - 854 N 00014 - 96 - 1 - 0521 CS - TR - 3780 March 1997

Many types of common objects, such as tools and vehicles, usually move in simple ways when they are wielded or driven: The natural axes of the object tend to remain aligned with the local trihedron defined by the object's trajectory. Based on this observation we use a model called Frenet-Serret motion which corresponds to the motion of a moving trihedron along a space curve. Knowing how the Fre...

متن کامل

LAMP - TR - 119 CS - TR - 4695 UMIACS - TR - 2005 - 04 February 2005 Automatically Evaluating Answers to Definition Questions

Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called Pourpre, for automatically evaluating answers to definition questions. Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether an information nugget appears in a...

متن کامل

CAR - TR - 673 April 1993 CS - TR - 3078 ISR - 93 - 52 AlphaSlider : Searching Textual Lists with Sliders

AlphaSlider is a query interface that uses a direct manipulation slider to select words, phrases, or names from an existing list. This paper introduces a prototype of AlphaSlider, describes the design issues, reports on an experimental evaluation, and offers directions for further research. The experiment tested 24 subjects selecting items from lists of 40, 80, 160, and 320 entries. Mean select...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001